Wasserstein k-means++ for Cloud Regime Histogram Clustering
نویسندگان
چکیده
Much work has sought to discern the different types of cloud regimes, typically via Euclidean k-means clustering of histograms. However, these methods ignore the underlying similarity structure of cloud types. Wasserstein k-means clustering is a promising candidate for utilizing this structure during clustering, but existing algorithms do not scale well and lack the quality guarantees of the Euclidean case. We resolve this by generalizing k-means++ guarantees to the Wasserstein setting and providing a scalable minibatch algorithm for Wasserstein k-means. Our methods empirically perform well and lead to new, different cloud regime prototypes.
منابع مشابه
Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances
This paper deals with clustering methods based on adaptive distances for histogram data using a dynamic clustering algorithm. Histogram data describes individuals in terms of empirical distributions. These kind of data can be considered as complex descriptions of phenomena observed on complex objects: images, groups of individuals, spatial or temporal variant data, results of queries, environme...
متن کاملK-Histograms: An Efficient Clustering Algorithm for Categorical Dataset
Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-histogram, a new efficient algorithm for clustering categorical data. The k-histogram algorithm extends the k-means algorithm to categorical domain by replacing the means of clusters with histograms, and dynamically updates histograms in the clustering process. E...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملCloud Properties over the North Slope of Alaska: Identifying the Prevailing Meteorological Regimes
Long time series of Arctic atmospheric measurements are assembled into meteorological categories that can serve as test cases for climate model evaluation. The meteorological categories are established by applying an objective k-means clustering algorithm to 11 years of standard surface-meteorological observations collected from 1 January 2000 to 31 December 2010 at the North Slope of Alaska (N...
متن کاملDetection and tracking of gas plumes in LWIR hyperspectral video sequence data
Automated detection of chemical plumes presents a segmentation challenge. The segmentation problem for gas plumes is difficult due to the diffusive nature of the cloud. The advantage of considering hyperspectral images in the gas plume detection problem over the conventional RGB imagery is the presence of non-visual data, allowing for a richer representation of information. In this paper we pre...
متن کامل